Similarity-Based Methods For Word Sense Disambiguation

ثبت نشده

چکیده

We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in the training set have major impact on similarity-based estimates. 1 I n t r o d u c t i o n The problem of data sparseness affects all statistical methods for natural language processing. Even large training sets tend to misrepresent low-probability events, since rare events may not appear in the training corpus at all. We concentrate here on the problem of estimating the probability of unseen word pairs, that is, pairs that do not occur in the training set. Katz's back-off scheme (Katz, 1987), widely used in bigram language modeling, estimates the probability of an unseen bigram by utilizing unigram estimates. This has the undesirable result of assigning unseen bigrams the same probability if they are made up of unigrams of the same frequency. Class-based methods (Brown et al., 1992; Pereira, Tishby, and Lee, 1993; Resnik, 1992) cluster words into classes of similar words, so that one can base the estimate of a word pair's probability on the averaged cooccurrence probability of the classes to which the two words belong. However, a word is therefore modeled by the average behavior of many words, which may cause the given word's idiosyncrasies to be ignored. For instance, the word "red" might well act like a generic color word in most cases, but it has distinctive cooccurrence patterns with respect to words like "apple," "banana," and so

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity-Based Methods for Word Sense Disambiguation

متن کامل

A new semantic similarity measure evaluated in word sense disambiguation

In this paper, a new conceptual hierarchy based semantic similarity measure is presented, and it is evaluated in word sense disambiguation using a well known algorithm which is called Maximum Relatedness Disambiguation. In this study, WordNet’s conceptual hierarchy is utilized as the data source, but the methods presented are suitable to other resources.

متن کامل

EBL-Hope: Multilingual Word Sense Disambiguation Using a Hybrid Knowledge-Based Technique

We present a hybrid knowledge-based approach to multilingual word sense disambiguation using BabelNet. Our approach is based on a hybrid technique derived from the modified version of the Lesk algorithm and the Jiang & Conrath similarity measure. We present our system's runs for the word sense disambiguation subtask of the Multilingual Word Sense Disambiguation and Entity Linking task of SemEva...

متن کامل

Tiantianzhu7: System Description of Semantic Textual Similarity (STS) in the SemEval-2012 (Task 6)

This paper briefly reports our submissions to the Semantic Textual Similarity (STS) task in the SemEval 2012 (Task 6). We first use knowledge-based methods to compute word semantic similarity as well as Word Sense Disambiguation (WSD). We also consider word order similarity from the structure of the sentence. Finally we sum up several aspects of similarity with different coefficients and get th...

متن کامل

Semantic Similarity Functions in Word Sense Disambiguation

This paper presents a method of improving the results of automatic Word Sense Disambiguation by generalizing nouns appearing in a disambiguated context to concepts. A corpus-based semantic similarity function is used for that purpose, by substituting appearances of particular nouns with a set of the most closely related similar words. We show that this approach may be applied to both supervised...

متن کامل

Noun Sense Induction and Disambiguation using Graph-Based Distributional Semantics

We introduce an approach to word sense induction and disambiguation. The method is unsupervised and knowledge-free: sense representations are learned from distributional evidence and subsequently used to disambiguate word instances in context. These sense representations are obtained by clustering dependency-based secondorder similarity networks. We then add features for disambiguation from het...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Similarity-Based Methods For Word Sense Disambiguation

ثبت نشده

چکیده

منابع مشابه

Similarity-Based Methods for Word Sense Disambiguation

A new semantic similarity measure evaluated in word sense disambiguation

EBL-Hope: Multilingual Word Sense Disambiguation Using a Hybrid Knowledge-Based Technique

Tiantianzhu7: System Description of Semantic Textual Similarity (STS) in the SemEval-2012 (Task 6)

Semantic Similarity Functions in Word Sense Disambiguation

Noun Sense Induction and Disambiguation using Graph-Based Distributional Semantics

عنوان ژورنال:

اشتراک گذاری